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Abstract. The general principles of Bayesian data analysis imply that 
models for survey responses should be constructed conditional on all 
variables that affect the probability of inclusion and nonresponse, which 
are also the variables used in survey weighting and clustering. However, 
such models can quickly become very complicated, with potentially 
thousands of poststratification cells. It is then a challenge to develop 
general families of multilevel probability models that yield reasonable 
Bayesian inferences. We discuss in the context of several ongoing public 
health and social surveys. This work is currently open-ended, and we 
conclude with thoughts on how research could proceed to solve these 
problems. 
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pling weights, shrinkage. 



1. BACKGROUND 

Survey weighting is a mess. It is not always clear 
how to use weights in estimating anything more com- 
plicated than a simple mean or ratios, and stan- 
dard errors are tricky even with simple weighted 
means. (Software packages such as Stata and SU- 
DAAN perform analysis of weighted survey data, 
but it is not always clear which, if any, of the avail- 
able procedures are appropriate for complex adjust- 
ment schemes. In addition, the construction of 
weights is itself an uncodified process.) Contrary 
to what is assumed by many theoretical statisti- 
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cians, survey weights are not in general equal to 
inverse probabilities of selection but rather are typ- 
ically constructed based on a combination of prob- 
ability calculations and nonresponse adjustments. 

Regression modeling is a potentially attractive al- 
ternative to weighting. In practice, however, the po- 
tential for large numbers of interactions can make 
regression adjustments highly variable. This paper 
reviews the motivation for hierarchical regression, 
combined with poststratification, as a strategy for 
correcting for differences between sample and pop- 
ulation. We sketch some directions toward a practi- 
cal solution, which unfortunately has not yet been 
reached. 

1.1 Estimating Population Quantities from a 
Sample 

Our goal is to use sample survey data to estimate 
a population average or the coefficients of a regres- 
sion model. The regression framework also includes 
small-area estimation, since that is simply a regres- 
sion on a discrete variable corresponding to indica- 
tors for the small areas. 

We shall consider two running examples: a series 
of CBS/New York Times national polls from the 
1988 election campaign, and the New York City So- 
cial Indicators Survey, a biennial survey of families 
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Fig. 1. The proportion of adults surveyed who answered yes 
in the Gallup Poll to the question, "Are you m favor of the 
death penalty for a person convicted of murder?" among those 
who expressed an opinion on the question. It would be inter- 
esting to estimate these trends in individual states. 

that was conducted by Columbia University's School 
of Social Work (Garfinkel and Meyers, 1999; Meyers 
and Teitler, 2001; Garfinkel et al., 2003). Both sets 
of surveys used random digit dialing. 

For the pre-election polls, our quantity of primary 
interest is the proportion of people who support the 
Republican candidate for President in the country 
or in each state [or the proportion of voters who sup- 
port the Republican candidate, which is a ratio: the 
proportion of people who will vote and support the 
Republican, divided by the proportion who will sup- 
port the Republican; it is straightforward to move 
from estimating a population mean to estimating 
this ratio, as discussed in the context of this exam- 
ple by Park, Gelman and Bafumi (2004)]. We would 
also like to use series of national polls to estimate 
state-by-state time trends, for example in the sup- 



port for the death penalty over the past few decades. 
(See Figure 1 for the national trends.) 

For the Social Indicators Survey, we are interested 
in population average responses to questions such as, 
"Do you rate the schools as poor, fair, good or very 
good?", average responses in subpopulations (e.g., 
the view of the schools among parents of school-age 
children) , and so-called "analytical" studies that can 
be expressed in terms of regressions (e.g., predicting 
total satisfaction given demographics and specific 
attitudes about health care, safety, etc.). In this ar- 
ticle, we focus on trends from 1999 to 2001, as mea- 
sured by changes in two successive Social Indicators 
Surveys, on a somewhat arbitrary selection of ques- 
tions chosen to illustrate the general concerns of the 
survey. 

Table 1 shows the questions, the estimated aver- 
age responses in each year, and the estimated differ- 
ences and standard errors as obtained using two dif- 
ferent methods of inference. This paper is centered 
on the puzzle of how these two estimation meth- 
ods differ. We shall get back to this question in a 
moment after reviewing some basic ideas in survey 
sampling inference. 

1.2 Poststratification and Weighting 

Naive promulgators of Bayesian inference — or the 
modeling approach to inference in general — used to 
say that the method of data collection was irrele- 
vant to estimation from survey data. All that mat- 
ters, from this slightly misguided perspective, is the 
likelihood, or the model of how the data came to be. 
However, as has been pointed out by Rubin (1976), 
the usual Bayesian or likelihood analysis implicitly 
assumes the design is "ignorable," which in a sam- 
pling context roughly means that the analysis in- 
cludes all variables that affect the probability of a 



Table 1 



Weighted 
averages 



Question 



1999 



2001 



(a) time 

change 

in 

percent 



(b) linear 
regression 
coefficient 
of time 



(a) time 
change 
on logit 
scale 



(b) 

logistic 

regression 

coefficient 

0.18 (0.20) 
0.27 (0.10) 



Adult in good/excellent health 
Child in good/excellent health 
Neighborhood is safe/very safe 



75% 78% 
82% 84% 
77% 81% 



3.4% (2.4%) 
1.7% (1.5%) 
4.5% (2.3%) 



6.6% (1.4%) 
1.2% (1.3%) 
4.1% (1.5%) 



0.19 (0.13) 
0.24 (0.21) 
0.27 (0.14) 



Estimates for some responses from two consecutive waves of the New York City Social Indicators Survey, and estimated 
changes, with standard errors in parentheses. Changes are estimated in percentages and on the logit scale. In each scale, two 
estimates are presented: (a) simple differences in weighted means and (b) regression controlling for the variables used in the 
weighting. Approaches (a) and (b) can give similar results but sometimes are much different. 
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person being included in the survey (see Chapter 7 
of Gelman et al., 2004, for a review). 

In a regression context, the analysis should in- 
clude, as "X variables," everything that affects sam- 
ple selection or nonresponse. Or, to be realistic, all 
variables should be included that have an impor- 
tant effect on sampling or nonresponse, if they also 
are potentially predictive of the outcome of interest. 
In a public survey such as the CBS polls, a good 
starting point is the set of variables used in their 
weighting scheme: number of adults and number of 
telephone lines in the sampled household; region of 
the country; and sex, ethnicity, age and education 
level of the respondent (see Voss, Gelman and King, 
1995). For the Social Indicators Survey, we did our 
own weighting (Becker, 1998) using similar informa- 
tion: number of telephone lines (counted as 1/2 for 
families with intermittent phone service), number of 
adults and children in the family, and ethnicity, age 
and education of the head of household. Weights for 
each survey are constructed by multiplying a series 
of factors. 

In the sampling context, ignorability corresponds 
to the assumption of simple random sampling within 
poststratification cells or, more generally, the as- 
sumption that, within poststratification cells, the 
relative probabilities of selection are equal. (This 
is the information used in constructing sampling 
weights.) Adjustment for unit nonresponse is im- 
plicit in this framework; for example, by poststrati- 
fying on sex, an analysis adjusts simultaneously for 
differences between men and women in probability 
of inclusion in the sample (i.e., probability of being 
sampled, multiplied by probability of responding). 
We shall ignore item nonresponse (or, equivalently, 
suppose any missing data have been randomly im- 
puted; see the discussion in Rubin, 1996). 

We now review the unified notation for poststrat- 
ification and survey weighting of Little (1991, 1993) 
and Gelman and Carlin (2002); see also Holt and 
Smith (1979). Here we use the notation y, z for vari- 
ables that are observed in the sample only, and X 
for variables that are observed in the sample and 
known in the population. For simplicity, we assume 
throughout this article that the population size is 
large, so that the finite-population quantities of in- 
terest (averages, population totals or regression co- 
efficients) are essentially the same as the correspond- 
ing superpopulation quantities. 



Poststratification. The purpose of poststratifica- 
tion is to correct for known differences between sam- 
ple and population. In the basic formulation, we 
have variables X whose joint distribution in the pop- 
ulation is known, and an outcome y whose popu- 
lation distribution we are interested in estimating. 
We shall assume X is discrete, and label the possi- 
ble categories of X as poststratification cells j, with 
population sizes Nj and sample sizes Uj. In this no- 
tation, the total population size is N = Ej=i Nj and 

the sample size is n = X]/=i'^i- The implicit model 
of poststratification is that the data are collected by 
simple random sample within each of the J post- 
strata. The assignment of sample sizes to poststrata 
is irrelevant. In fact, classical stratification (in which 
the sampling really is performed within strata) is a 
special case of poststratification as we formulate it. 
We assume the population size Nj of each category 
J is known. These categories include all the cross- 
classifications of the predictors X. [In some cases 
the cell populations are unknown and must be esti- 
mated. For example, in the Social Indicators Survey, 
we adjust to estimated demographics from the Cur- 
rent Population Survey, which includes about 2000 
New York City residents each year. This is enough 
to give reliable estimates of one-way and two-way 
margins (e.g., the proportion of city residents who 
are white females, white males, black females, black 
males, etc.), but the counts are too sparse to directly 
estimate deep interactions (e.g., the proportion who 
are white females, 30-45, married, with less than a 
high school education, etc.). The usual practical so- 
lution in this case is to poststratify on the margins 
(e.g., raking; see, e.g., Deville, Sarndal and Sautory, 
1993). If the whole table of population counts is 
required, it can be estimated using iterative pro- 
portional fitting (Deming and Stephan, 1940) which 
sets interactions to be as small as possible while be- 
ing consistent with the available population data. 
For this paper, we shall ignore this difficulty and 
treat the full vector of Nj^s as known.] 

The population mean of any survey response can 
be written as a sum over poststrata. 



definition of population mean: 



(1) 

with corresponding estimate. 



(2) poststratified estimate: 9 
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We use the general notation 6j rather than Yj to al- 
low for immediate generalization to other estimands 
such as regression coefficients. 

Weighting. When you look at sample survey data 
from a public- use dataset, the "survey weight" looks 
like a unit-level characteristic — ^just one more col- 
umn in the data — and it is easy to think of it al- 
most as a survey response, Wi. In this context it 
seems natural to use weighted averages of the form 

But survey weights are not attributes of individ- 
ual units — they are constructions based on an entire 
survey. Within any poststratification cell, all units 
have the same poststratification weight adjustment. 
(In theory, continuously- varying survey weights could 
arise from a survey with a continuous range of sam- 
pling probabilities. For example, one could imagine 
a survey of college-bound students where the prob- 
ability of selection is a continuous function of back- 
ground variables [e.g., Pr(selection) = logit~"'^(a -|- b ■ 
SAT)]. Or one could model nonresponse as a contin- 
uous function of predictors such ge and previous 
health status in a medical survey. These continuous 
weights do not come up much in the sorts of social 
surveys under consideration in this article, but they 
are interesting research directions that are poten- 
tially important in other areas of application.) We 



shall refer to unit weights Wi 



1, 



, n, and cell 



weights Wj = UjWi for units i within cell j. 



weighted average: y 



(3) 



Wi 



Survey weights in general depend on the actual 
data collected as well as on the design of the sur- 
vey. For example, consider the seven CBS polls con- 
ducted during the week before the 1988 Presiden- 
tial election. These surveys had identical designs and 
targeted the same population. However, the weight- 
ing factor assigned to men (compared to a factor 
of 1 for women) varies from as low as 1.10 to 1.27 
among the seven surveys. The different samples hap- 
pened to contain different ratios of men to women 
and hence needed different adjustments. 

Weighting based on sampling probabilities. A fur- 
ther complication is that survey weighting is com- 
monly performed on some variables using inverse- 
sampling probabilities rather than poststratification. 



For example, in the Social Indicators Survey we as- 
signed weights of 1/2, 1 and 2 for households with 
multiple phone lines, exactly one phone line and in- 
termittent phone service, respectively. Unlike post- 
stratification weights, these weighting factors are 
fixed and do not depend on the sample. 

These inverse-probability weights are important 
in some survey designs and are sometimes portrayed 
as producing unbiased estimates, but this unbiased- 
ness breaks down in the presence of nonresponse. For 
example, some telephone surveys give each respon- 
dent a weighting factor proportional to the num- 
ber of adults in his or her household; this is an 
inverse-probability weight given that all households 
are equally likely to be selected (after correcting for 
the number of telephone lines) and the respondent is 
selected at random among the adults in the house- 
hold. In practice, however, such weighting overrep- 
resents persons in large households, presumably be- 
cause it is easier to find someone at home from a 
household where more adults are living. Poststrati- 
fication weights (which are roughly approximate to 
weighting by the square root of the number of adults 
in the households) give a better fit to the population 
(Gelman and Little, 1998). 

In this article we shall assume that any factors 
associated with sampling weights have already been 
folded into the poststratification. For example, con- 
sider a survey that is poststratified into 16 cate- 
gories (2 sexes x 2 ethnicity categories x 4 age 
ranges), and also has telephone weights of 1/2, 1 
and 2. The three categories of telephone weights 
would then represent another dimension in the ad- 
justment, thus giving a total of 48 categories. We 
recognize that treating this weighting as pure post- 
stratification is an oversimplification; for one thing, 
sampling variances for poststratified estimates are 
generally different from those for fixed weights (see, 
e.g.. Binder, 1983; Lu and Gelman, 2003). 

1.3 Competing Methods of Estimation: 

Weighted Averages, Weighted Regression 
and Unweighted Regression Controlling for X 

Many researchers have noted the challenge of us- 
ing survey weights in regression models (as reviewed, 
e.g., by DuMouchel and Duncan, 1983; Kish, 1992; 
Pfeffermann, 1993). For the goal of estimating a 
population mean, it is standard to use the weighted 
average (3), but it is not so clear what to do in more 
complicated analyses. For example, when estimating 
a regression of y on z, one recommended approach 
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respid 


org 


year 


survey 


y 


state 


edu 


age 


female 


black 


adults 


phones 


weight 


11352 


6140 


cbsnyt 


7 


9158 


NA 


7 


3 


1 


1 





2 


1 


923 


11353 


6141 


cbsnyt 


7 


9158 


1 


39 


4 


2 


1 





2 


1 


558 


11354 


6142 


cbsnyt 


7 


9158 





31 


2 


4 


1 





1 


1 


448 


11355 


6143 


cbsnyt 


7 


9158 





7 


3 


1 


1 





2 


1 


923 


11356 


6144 


cbsnyt 


7 


9158 


1 


33 


2 


2 


1 





1 


1 


403 



Data from the first five respondents of a CBS pre-election poll. The weights are listed as just another survey variable, but 
they are actually constructed after the survey has been conducted, so as to match sample with known population information. 



Table 3 



True Different standard error estimates 





standard 


assuming 


conditioning 


assuming 


design- 


Opinion of NYC 


error 


SRS 


on weights 


inv-prob 


based 


Became a better place 


2.2% 


1.2% 


2.5% 


1.9% 


2.1% 


Remained the same 


2.0% 


1.2% 


2.3% 


1.6% 


1.9% 


Gotten worse 


2.0% 


1.2% 


2.4% 


1.7% 


2.0% 



From a simulation study: true standard error and four different standard error estimates for a 
question on the Social Indicators Survey. Ignoring the weighting or treating the weights as con- 
stant underestimates uncertainty, whereas uncertainty is overestimated by treating the weights as 
inverse probabilities. Accurate standard errors can be obtained using a jackknife-like procedure that 
explicitly takes account of the design of the weighting procedure. From Lu and Gelman (2003). 



is to use weighted least squares, and another option 
is to perform unweighted regression of y on z, also 
controlling for the variables X that are used in the 
weighting. 

Computing standard errors is not trivial for 
weighted estimates, whether means or regressions, 
because the weights themselves generally are ran- 
dom variables that depend on the data (Yung and 
Rao, 1996). In particular, correct classical standard 
errors cannot simply be obtained from the data and 
the weights; one also needs to know the procedure 
used to create the weights. Table 3 illustrates prob- 
lems with some variance estimates that do not ac- 
count for the weighting design. Similarly, with re- 
gressions, simple weighted regression procedures do 
not in general give correct standard errors. 

1.4 The Crucial Role of Interactions 

Consider a regression of y on z, estimated in some 
way from a survey where inclusion probabilities de- 
pend on X. In general, y can depend on both X and 
z, in which case the appropriate way to estimate the 
regression of y on z is to regress y on X, z and then 
average over the population distribution of X. In 



general, estimating the regression of y on z requires 
estimation of the relation between z and X as well 
(Graubard and Korn, 2002). Because of the poten- 
tial dependence of z and X, it can be important to 
include interactions between these predictors in the 
model for y, even if the ultimate goal is simply to 
estimate the relation between y and z. 

In our survey adjustment framework, once a model 
includes interactions, poststratification is necessary 
in order to estimate population regression coeffi- 
cients. For a simple example, suppose we are inter- 
ested in the population regression of log earnings on 
height (in inches), using a survey that is adjusted 
to match the proportion of men and women in the 
population. The estimated regression (see Gelman 
and Hill, 2007) including the interaction is 

y = log(earnings) 

= 8.4 + 0.017 • height - 0.079 • male 

-|- 0.007 • height • male -|- error. 

For any given height z, the expected value of log 
earnings is 

E(y|z) = 8.4-h0.017z 
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(4) - 0.079 •E(male|height = z) 

+ 0.007z • E(male|height = z). 

Here, E(inale|height = z) would be estimated from 
the survey itself; most likely we would do this by 
fitting a linear regression (with Gaussian errors) of 
height given sex, and then simply using Bayes' rule, 
along with the population proportions of men and 
women, to compute the conditional probability. 

The conditional expectation (4) is not, in general, 
a linear function of z. Thus, although we can define 
the population regression of log earnings on height — 
it is the result of fitting a simple linear regression of 
y on z to the entire population — it is not clear why 
it should be of any interest. 

This difficulty in interpreting regression coefficients 
in the context of survey adjustments is one reason 
we have been careful in Section 1.1 to consider es- 
timands that are simple comparisons of population 
averages. We illustrate with the goal of estimating 
the average difference in log earnings between whites 
and nonwhites; this is also a regression, but because 
the predictor z is binary, it is defined unambiguously 
as a difference. Again, we suppose for simplicity that 
the survey is adjusted only for sex. The estimated 
regression fit, including the interaction, is 

y = log(earnings) 

= 9.5 - 0.02 • white + 0.20 • male 

+ 0.41 • white • male + error. 

The population difference in log earnings is then 

E(?/|white = 1) - E(y|white = 0) 

= -0.02 + 0.20 • (E(male|white = 1) 

- E(male|white = 0)) 

+ 0.41 • E(male|white = 1), 

and the factors E(male|white = 0) and E(male| 
white = 1) can be estimated from the data. More 
generally, this example illustrates that, once we fit 
an interaction model in a survey adjustment con- 
text, we cannot simply consider a single regression 
coefficient (in this case, for white) but rather must 
also use the interacted terms in averaging over post- 
stratification cells. 

Our focus in this article is on the relation be- 
tween the model for the survey response and the 
corresponding weighted-average estimate. The ulti- 
mate goal is to have a model-based procedure for 



constructing survey weights, or conversely to set up 
a framework for regression modeling that gives ef- 
ficient and approximately unbiased estimates in a 
survey-adjustment context. 

2. THE CHALLENGE 
2.1 Estimating Simple Averages and Trends 

We now return to the example of Table 1. The goal 
is to estimate y^°°^ — y^^^^ ^ the change in popu- 
lation average response between two waves of the 
Social Indicators Survey. This can be formulated 
as the coefficient /3i in a regression of y on time: 
y = Po + Piz + error, where the data from the two 
surveys are combined, and z = and 1 for respon- 
dents of the 1999 and 2001 surveys, respectively. 

A more general model is y = Po + f3iz + + 
error, where P2 is a vector of coefficients for the 
variables X used in the weighting. Now the quan- 
tity of interest is Pi + P2{x'^^^^ — X^^^^), to account 
for demographic changes between the two years. For 
New York City between 1999 and 2001, these demo- 
graphic changes were minor, and so it is reasonable 
to simply consider Pi to be the quantity of interest. 

This brings us to the puzzle of Table 1. For each of 
three binary outcomes y, we compute the weighted 
mean for each year, y^^^^ and y'^^^ , and two esti- 
mates of the change: 

• Our first estimate is the simple difference, y'^^^ — 
y}^^^ , with standard error ^var(y200i') +var(y4999), 
where the sampling variances are computed using 
the design of the weights (as in the rightmost col- 
umn in Table 3). 

• Our other estimate is obtained by linear regres- 
sion. We combine the data from the two surveys 
into a single vector, y = (y^^^^, y^"'^^), and create 
an associated indicator vector z that equals for 
the data from 1999 and 1 for the data from 2001. 
We fit a linear regression of y on z, also controlling 
for the variables X used in the weighting. (These 
X variables are number of adults in the house- 
hold, number of children in the family, number 
of telephone lines, marital status, and sex, age, 
ethnicity and education, and ethnicity x educa- 
tion for the head of household.) To estimate the 
change from 1999 to 2001, we use the coefficient of 
z, with standard error automatically coming from 
the (unweighted) regression. 

As indicated in the third and fourth columns of 
Table 1, the regression coefficient and the change in 
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weighted averages tend to have the same sign, but 
the two estimates sometimes differ quite a bit in 
magnitude. (Similar results are obtained if we work 
on the logit scale, as can be seen from the final two 
columns of the table.) 

What should we believe? For this particular exam- 
ple, the direct analysis of weighted averages seems 
more believable to us, since we specifically created 
the weighting procedure for the goal of estimating 
these citywide averages. More generally, however, 
using weighted averages is awkward and we would 
prefer to use the more general techniques of regres- 
sion and poststratification. 

Where do we go from here? We would like an 
approach to statistical analysis of survey data that 
gives the right answers for simple averages and com- 
parisons, and can be smoothly generalized to more 
complicated estimands. 

2.2 Deep Poststratification 

One of the difficulties of survey weighting is that 
the number of poststratification cells can quickly be- 
come large, even exceeding the number of respon- 
dents. This leads naturally to multilevel modeling 
to obtain stable estimates in all the poststratifica- 
tion cells, even those with zero or one respondent. 
Choices must then be made in the modeling of in- 
teractions. 

For example, in our time-trend estimation prob- 
lem, we could model y = [j^ + (3iz + + PsXz + 
error, where P3 is a vector of coefficients for the in- 
teraction of X and time. We would then be inter- 

+ ^ ■ « I /9 /x^2001 ^1999 ^ , n ^2001 , . 

ested m pi + p2{X — X )+ jj^X (as m the 
example at the end of Section 1.4). Where should 
the interaction modeling stop? A simulation study 
(Cook and Gelman, 2006) suggests that, in this ex- 
ample, efficient and approximately unbiased estima- 
tors are obtained by including interactions of the 
time indicator with all the survey adjustment fac- 
tors; as a general approach, however, including all 
interactions can yield unstable estimates. The prac- 
tical problem of adjusting for survey nonresponse 
leads to general questions of inference under multi- 
way interactions, an issue that becomes even more 
relevant in small-area estimation. 

Gelman and Carlin (2002) and Park, Gelman and 
Bafumi (2004) discuss the estimation of state-level 
opinions from national polls, using a hierarchical 
logistic regression with demographics and state ef- 
fects, followed by poststratification on Census pop- 
ulation totals for 64 demographic categories in each 



of the 50 states. The method worked well, but it is 
not clear how it would perform if the model included 
interactions of demographic and state effects. 

3. USING REGRESSION MODELING TO 
CONNECT WEIGHTING AND 
POSTSTRATIFICATION 

When cell means are estimated using certain lin- 
ear regression models, poststratified estimates can 
be interpreted as weighted averages (Little, 1991, 
1993). The idea is to work with the poststratified 
estimate (2) — an average over cell estimates 9j, with 
the regression model providing the Oj^s based on 
characteristics of the cells j. Under certain condi- 
tions, the poststratified estimate can be reinterpreted 
as a weighted average of the form (3), and then 
we can solve for the cell weights Wj and the unit 
weights Wi- 

3.1 Classical Models 

Full poststratification. The simplest case is full post- 
stratification of raw data, in which case the cell es- 
timates are the cell means, 9j = yj, and (2) becomes 



full poststratification: 9 



PS 



Ej=i 



which is equivalent to (3) with cell weights Wj oc Nj 
or unit weights Wi oc Aj(j)/nj(j), where j{i) is the 
poststratification cell to which unit i belongs. 

This estimate can also be viewed classical 
regression including indicators for all J poststratifi- 
cation cells. 

No weighting. The other extreme is no weighting, 
that is, unit weights Wi = 1 for all i, which is equiv- 
alent to poststratification if the cell estimates 9j are 
all equal to the sample mean y, which in turn corre- 
sponds to classical regression including only a con- 
stant term. 

Classical regression on cell characteristics. Inter- 
mediate cases of weighting can be obtained by re- 
gression models that include information about the 
poststratification cells without going to the extreme 
of fitting a least-squares predictor to each cell. For 
example, in the CBS/New York Times pre-election 
surveys, one could regress y on indicators for sex, 
ethnicity, age, education and region, without neces- 
sarily including all their interactions. 

Suppose the regression model is y ~ N(A'/3, cj^/). 
We shall use X to represent the n x k matrix of 
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predictors in the data, and to represent the 

J X k matrix of predictors for the J poststratifica- 
tion cells. We also label the vector of poststratum 
populations as A^p°p = [Ni, . . . , Nj), with a sum of 

The estimated vector of regression coefficients is 
then (3 = {X''X)~^X^y, and the estimated cell means 
are XP°P/3. The poststratified estimate of the popu- 
lation mean is then 

classical regression: 
(5) ^^^ = ^EiV,(XfP/3) 



^ 1 



(6) 



N 



(A^P°P)*XP°P(X*X)-^X*y, 



which can be written as 6^^ = ^J^i=iWiyi, with a 
vector of unit weights, 

(7) u;= (^^(AfP°P)*XP°P(X*X)-iX* 

For convenience, we have renormalized these weights 
to sum to n (see below). In (7), w is a vector of 
length n that takes on at most J distinct values. The 
vector of J possible unit weights (corresponding to 
units in each of the J poststrata) is 



(8) w;P°P = ( -^(iVP°P)*XP°P(X*X)~i(XP°P)* 



and the poststratified estimate can also be expressed 
as 



The key result that makes the above computa- 
tions possible — that allows 6^^ to be interpreted as 
a weighted average of data — is that the derived unit 
weights w m{7) sum to n. The identity X/?=i — 
can be proved using matrix algebra but is more eas- 
ily derived from an invariance in the classical re- 
gression model. With a least-squares regression, if a 
constant is added to all the data, that same constant 
will be added to the intercept, with the other coeffi- 
cients not changing at all. Adding a constant to the 
intercept adds that same constant to 9^^ in (5). We 
have thus established that adding a constant to each 
data point yi adds that same constant to 9^^; thus, 
when 9^^ is expressed as 9^^ = }iYll=i'^iyii these 
Wi/n^s must sum to 1. 



The left panel of Figure 2 shows the unit weights 
obtained by fitting a sequence of classical regression 
models to the CBS/New York Times survey data. 
As more factors and interactions are included, the 
weights become more variable. 

3.2 Hierarchical Models 

We next consider the estimates that arise when 
applying the basic poststratification formula (2) when 
the cell means 9j are estimated using hierarchical 
models. As we shall see, we can formulate the result- 
ing 9^^ as a weighted average as in (3). In the classi- 
cal estimates we have just considered, the equivalent 
weights Wi depend on the structure of the model 
and the values of the predictors X [see, e.g., (8)]. 
In contrast, with hierarchical models, we find that 
the Wi^s depend on the response variable y being 
analyzed; for example, the vector of weights for the 
question on the respondent's health will be differ- 
ent than the vector of weights for the respondent's 
perception of the public schools. In our analysis we 
shall suppose that a particular response variable y 
of interest has been selected (e.g., vote preference in 
the pre-election polls). 

Hierarchical regression. The results in the previ- 
ous section can be immediately generalized to mul- 
tilevel regression models in which some of the coef- 
ficients are batches of indicator variables. We shall 
generalize the regression model to y~N(X/?, S^) 
with a prior distribution on (3 of the form (3 ^ N(0, S^). 
For simplicity, we assume independence of the com- 
ponents of P in the prior distribution, conditional 
on hyperparameters for the variance components. 
[In practice, the covariance matrix would come 
from a fitted hierarchical model, and our analysis ig- 
nores uncertainty in the estimated hyperparameters. 
A fully Bayesian analysis would continue by averag- 
ing over the posterior distribution of S^, which in 
turn would lead to a posterior distribution of equiv- 
alent weights; which might be summarized by a pos- 
terior mean, thus leading to posterior, or consensus, 
weights. Rao (2003) discusses this issue from a clas- 
sical sampling-theory perspective.] 

The prior precision matrix is then diagonal, 
with zeroes for nonhierarchical regression coefficients 
(including the constant term in the regression) . For 
example, consider a regression for the CBS/New 
York Times polls, with the following predictors: 

• A constant term 

• An indicator for sex (1 if female, if male) 
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Weights for classical models Weigtits for hierarchiical Bayes models 




model model 

Fig. 2. Equivalent unit weights Wi for one of the CBS/New York Times surveys, based on a series of models fit first using 
classical regression and then using Bayesian hierarchical regression. The models are nested, controlling for (1) male/female, 
(2) also black/white, (3) also male/female x black/white, (4) also four age categories, (5) also four education categories, (6) 
also age x education and (7) also state indicators. Each model includes more factors and thus has more possible weights, which 
are renormalized to average to 1 for each model. For the Bayes models, the indicators for age, education, age x education 
and state are given independent batches of varying coefficients. For the classical weights, model (7) is not included because of 
collinearity. 

The lines in each graph connect the weights for individual respondents, which are divided into successively more categories 
as predictors are added to the models. 



• An indicator for ethnicity (1 if black, otherwise) 

• Sex X ethnicity 

• 4 indicators for age categories 

• 4 indicators for education categories 

• 16 age X education indicators. 

The classical regression has 1 + 1 + 1 + 1 + 3 + 3 + 
9 = 19 predictors (avoiding collinearity by excluding 
the baseline age and education categories). The hier- 
archical regression has 1 + 1 + 1 + 1 + 4 + 4 + 16 = 28 
predictors, and its prior precision matrix has the 
form 

^ = Diag(0, 0, 0, 0, 0"age; <^agc5 <^agG; <^agc5 <^cdu' 

-2 -2 -2 -2 -2 \ 

"cdu' "^cdu' "edu' ''^age.edu' • • • ' "age.cdu/' 

with the parameters (Tage, (Jcdu and (Tage.edu esti- 
mated from data. 

The estimated vector of regression coefficients is 
then P = + J:^^)-^X^i:-^y and expres- 

sions (6)-(8) become 

Bayes poststratification: 

^ps _ J_('jYP°P)*xP°P 

x(X*S-iX + S^i)-iX*S-iy, 



(9) w= f — (ArP°P)*XP°P 

x(x*s-ix+s-i)-ix%-iy, 

^pop ^ ^Zi(jvp°p)*xp°p 

X (X*S;iX + S^i)-i(XP°P)*S-iy. 

Conditional on the variance parameters in Tiy and 
Ebetai then estimates from this model correspond to 
weighted averages. 

The right panel of Figure 2 shows the unit weights 
obtained by fitting a sequence of Bayesian models 
to the CBS/New York Times poh. The first three 
models are actually identical to the classical (non- 
hierarchical) versions, since we assign noninforma- 
tive uniform prior distributions to the coefficients 
for sex, ethnicity and their interactions. Models 4 
and 5 are similar to the classical fits because age and 
education have only four categories, so there is lit- 
tle information available for partial pooling of these 
effects (see Gelman, 2006). The weights in model 
6, with age x education interactions included, are 
smoothed somewhat compared to the correspond- 
ing classical model. Finally, introducing state effects 
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leads to a downweighting of some of respondents in 
states that happen to be overrepresented in the sur- 
vey, and an upweighting for respondents in the un- 
dersampled states. There is no corresponding classi- 
cal model here because the survey does not actually 
include data from all 50 states. 

Exchangeable normal model. To understand these 
formulas better, we consider the special case of an 
exchangeable normal model for the J cell means (see 
also Lazzeroni and Little, 1998; Elliott and Little, 
2000). This model can be expressed in terms of the 
cell means. 

This is a special case of the hierarchical regression 
model discussed above, so we already know that 
the poststratified estimate, conditional on the (esti- 
mated) variance parameters (Jy,ae, can be expressed 
as a weighted average of the cell means, yj, or equiv- 
alently as a weighted average of the data points j/j . 

In this simple example, however, we can gain some 
understanding by deriving algebraic expressions for 
the weights. Our goal is to express them in terms of 
the completely smoothed weights, Wj = 1, and the 



weights from full poststratification, Wj 



N,/N 



J rij/n 

We start with the posterior means (conditional 
on the variance parameters) of the cell means. We 
write these as ^a;. A; = 1, . . . , J (using /c as a subscript 
rather than j because this results in more convenient 
notation later). 



(10) 



where 



ill) 



{nk/al)yk + {l/<yl)ii 
Uk/al + l/al 



Yi=xVkl[ollnk^al) 

T.i=iV{<yl/nk + <yl) ' 



We can combine (10) and (11) to express each 6j as 
a linear combination of the cell means yk , 

J 

After some algebra, we can write these coefficients 
as 

2 

'^AkAj/A, fori//c, 

Ckj — \ ^2 

alAk + ^Al, ioi j = k, 



where 



Au 



E 



1 (^y^k + ct| ■ 



The payoff now comes in computing the poststrat- 
ified estimate. 



k=l 



J J 



\ - \ - Nk 



k=ij=i 



equating this to J2'j=i ^jVj thus deriving the 
cell weights. 



2^ j^^kj 



k=l 



A. 



Ni 2 , Y^^^^ 
t,N A nk\ 



N 



The implicit unit weights are then Wj = {n/nj)Wj , 
or 



w 



pop 



A — 



n 



(12) 



^ AN f^^ Uk 



k=l 



X 



■J 2 , T.k=iNk/{a; + nka'e) 



^ T.i=ink/{<Tl+nkal) 

The ratio of sums in (12) is a constant (given the 
fitted model) that does not depend on j. Let us ap- 
proximate it by N/n (which is appropriate if the 
sample proportions Uk/Nk are independent of the 
group sizes Nk)- Under this approximation, the unit 
weights can be written as 



approximate w^""^ 



(13) 



n.j/al 



n 



+ 



Vol 



N^ 
rij/n 

•1, 



which is a weighted average of the full poststratifica- 
tion unit weight, ^^^^ , and the completely smoothed 
weight of 1. Hierarchical poststratification is thus 
approximately equivalent to a shrinkage of weights 
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by the same factors as in the shrinkage of the pa- 
rameter estimates (10). 

Thus, as with hierarchical regression models in 
general, the amount of shrinkage of the weights de- 
pends on the between- and within-stratum variance 
in the outcome of interest, y. 

Other hierarchical models. Lazzeroni and Little 
(1998) and Elliott and Little (2000) discuss various 
hierarchical linear regression models, including com- 
binations of the two models described above (i.e., 
a hierarchical regression with a cell-level variance 
component) and models with correlations between 
adjacent cell categories for ordered predictors. 

Another natural generalization is to use logistic 
regression for binary inputs. Unfortunately, when we 
move away from linear regression, we abandon the 
translation invariance of the parameter estimates 
(i.e., the property that adding a constant to all the 
data affects only the constant term and none of the 
other regression coefficients). As a result, for logis- 
tic regression, the poststratified estimate 9^^ is no 
longer a weighted average of the data, even after 
controlling for the variance parameters in the model. 
However, we suspect that the model could be lin- 
earized, yielding approximate weights. 

3.3 Properties of the Model-Based 
Poststratified Estimates 

Standard errors. The variance of the poststrati- 
fied estimate, ignoring sampling variation in X, can 
be expressed using various formulas, 



var( 



1 " 
1=1 



2 2 



1 



n 



.J 

E( 



pop\2 2 



1 



J 



pop /yPoP _2 

Any of these equivalent expressions can be viewed 
as the posterior variance of 6 given a noninformative 
prior distribution on the regression coefficients, and 
ignoring posterior uncertainty in ay (Little, 1993). 

Dependence of implicit weights on the outcome 
variable. Classical survey weights depend only on 
the Tij's and the A'j's, as well as the design matrix 
X (used, e.g., to define the margins used in raking), 
but do not formally depend on y. (There is an in- 
formal dependence on y in the sense that there is 
no urgency to weight on variables X that do not 
help predict outcomes y of interest.) Similarly, the 



implicit weights (7) obtained from a classical regres- 
sion model depend only on n, and X, not on y. 

However, the implicit weights (9) from hierarchi- 
cal regression do depend on the data, implicitly, 
through the hyperparameters in Tiy and S^, which 
are estimated from the data. Thus, the appropriate 
weights could differ for different survey responses. 

4. WHERE TO GO NEXT 

There are currently two standard approaches to 
adjusting for known differences between sample and 
population in survey data: weighting and regression 
modeling. 

Practical limitations of weighting. The weighting 
approach has the advantage of giving simple esti- 
mates for population averages but has several dis- 
advantages. First, it is not generally clear how to 
apply weights to more complicated estimands such 
as regression coefficients. There has been some work 
on weighted regression for surveys (e.g., DuMouchel 
and Duncan, 1983; Pfeffermann, 1993) but these 
procedures are not very flexible, which is one rea- 
son why the modeling approach is more popular 
for problems such as small-area estimation (Fay and 
Herriot, 1979). A second problem with weighted es- 
timates is that standard errors are more difficult to 
evaluate (recall Table 3). Finally, weighting may be 
"dirty" but it is not always "quick": actually con- 
structing the weighting for a survey is more difficult 
than you might think. Creating practical weights re- 
quires arbitrary choices about inclusion of weighting 
factors and interactions, pooling of weighting cells 
and truncation of weights. (For example, in the So- 
cial Indicators Survey, we decided to weight on some 
interactions and not others in order to control vari- 
ability of the weights. While setting up the weight- 
ing procedure, we repeatedly compared weighted es- 
timates to Census values for various outcomes that 
we thought could be "canaries in the coal mine" if 
the survey estimate did not fit the population. These 
"canary" variables included percentage of New York 
City residents who are U.S. citizens, the percent who 
own their own home and income quintiles.) The re- 
sulting vector of weights is in general a complicated 
and not-fully-specified function of data and prior 
knowledge. Subjective choices arise in virtually all 
statistical methods, of course, but good advice on 
creating weights tends to be much vaguer than for 
other methods in the statistical literature (see, e.g., 
Lohr, 1999). 
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Practical limitations of modeling. Regression mod- 
eling is easy to do — even hierarchical regression is 
becoming increasingly easy in Bugs, Stata and other 
software packages (see, e.g., Centre for Multilevel 
Modelling, 2005; Gelman and Hill, 2007)— but for 
analysis of survey data it has the disadvantage that, 
to combine with population information, the regres- 
sion must theoretically condition on all the post- 
stratification cells, which can lead to very compli- 
cated models — more complicated than we are com- 
fortable with in current statistical practice — even in 
surveys of moderate size (see Section 2.2). When 
a model is too complicated, it becomes difficult to 
interpret or use the results, leading to awkward sit- 
uations such as in Table 1, where we simply cannot 
trust the regression coefficients for time trends in 
the Social Indicators Survey. 

It is a delicate point, because sometimes we do 
have confidence in regression coefficients, even with 
complicated hierarchical models with many parame- 
ters. For example, as discussed in Gelman and Car- 
lin (2002) and Park, Gelman and Bafumi (2004), 
hierarchical regression combined with poststratifi- 
cation performs excellently at estimating state-level 
opinions from the national CBS/New York Times 
polls. So it is not just the number of parameters 
that is important, but rather some connection be- 
tween the model and the quantities of interest, which 
is somehow more difficult to establish in the models 
whose results are shown in Table 1. 

Putting it together using hierarchical models and 
poststratification. Our ideal procedure should be as 
easy to use as hierarchical modeling, with popula- 
tion information included using poststratification as 
in (1). The procedure should feature a smooth tran- 
sition from classical weighting so that when differ- 
ent estimation methods give different results, it is 
possible to understand this difference result of 
interactions in the model (as discussed by Graubard 
and Korn, 2002). 

How do we get there? One place to start is to fo- 
cus on examples such as in Table 1 where different 
methods give different answers, and try to figure out 
which, if either, of the two estimates makes sense. A 
parallel approach is through simulation studies — for 
greater realism, these can often be constructed using 
subsamples of actual surveys — as well as theoretical 
studies of the bias and variance of poststratified es- 
timates with moderate sample sizes. In addition, a 
full hierarchical modeling approach should be able 



to handle cluster sampling (which we have not con- 
sidered in this article) simply as another grouping 
factor. 

We would like a general modeling procedure that 
gives believable estimates for time trends and as a 
byproduct produces a good set of weights that can 
be used for simple estimands. Given the difficulties 
with current methods for weighting and modeling, 
we believe this approach is of both practical and 
theoretical interest. 
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